Noise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram

نویسندگان

Taejin Park

Seungkwon Beack

Taejin Lee

چکیده

This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, another approach to minimizing the mismatches between clean and noisy speech features is needed. In this paper, we propose a feature extraction technique that is robust to noisy environments. The proposed scheme is based on the weighted histogram of the time-frequency gradient in a Melspectrogram image. Unlike previous approaches that use the magnitude of a Mel-spectrogram, we use the angle and magnitude information of a local gradient by employing a weighted histogram. Thus, our proposed speech feature shows a lower mean square error (MSE) between clean and noisy condition features as compared to other well-known speech features. In addition, the proposed scheme improves the word recognition test in a noisy environment with a relatively smaller number of coefficients as compared to similar studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Gradient Based Spectral Peak Location for Noise Robust Speech Recognition

In this paper a gradient-based algorithm for finding spectral peak locations is presented. The algorithm makes use of gradient and acceleration locations in the spectrogram for locating the peaks. Use of frequency gradients and accelerations locate peaks. The results are then interpolated to yield a smooth peak envelope. The method is evaluated in the aurora framework. A first pass locates all ...

متن کامل

Spectral maxima representation for robust automatic speech recognition

In the context of automatic speech recognition, the popular Mel Frequency Cepstral Coefficients(MFCC) as features, though perform very well under clean and matched environments, are observed to fail in mismatched conditions.The spectral maxima are often observed to preserve their locations and energies under noisy environments, but are not presented explicitly by the MFCC features. This paper p...

متن کامل

A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers which are based on commonly used features like mel-frequency cepstral co-efficient (MFCC) and linear predictive coding (LPC). It is well u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Noise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram

نویسندگان

چکیده

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Gradient Based Spectral Peak Location for Noise Robust Speech Recognition

Spectral maxima representation for robust automatic speech recognition

A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition

عنوان ژورنال:

اشتراک گذاری